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Abstract 

Background: Identifying gene regulatory network (GRN) from time course gene expression data has attracted 
more and more attentions. Due to the computational complexity, most approaches for GRN reconstruction are 
limited on a small number of genes and low connectivity of the underlying networks. These approaches can only 
identify a single network for a given set of genes. However, for a large-scale gene network, there might exist 
multiple potential sub-networks, in which genes are only functionally related to others in the sub-networks. 

Results: We propose the network and community identification (NCI) method for identifying multiple subnetworks 
from gene expression data by incorporating community structure information into GRN inference. The proposed 
algorithm iteratively solves two optimization problems, and can promisingly be applied to large-scale GRNs. 
Furthermore, we present the efficient Block PCA method for searching communities in GRNs. 

Conclusions: The NCI method is effective in identifying multiple subnetworks in a large-scale GRN. With the 
splitting algorithm, the Block PCA method shows a promosing attempt for exploring communities in a large-scale 
GRN. 



Background 

Rapid advances in high-throughput DNA microarray 
technology generate a huge amount of time course gene 
expression data which, in turn, calls for efficient compu- 
tational models to characterize the network of genetic 
regulatory interactions. A number of methods have been 
proposed to infer GRNs from gene expression data. 
Boolean networks [1] use two states, "ON" or "OFF" to 
represent the state of each gene, and each state at the 
next time step is determined by Boolean logical rules. 
Bayesian Networks [2] infer causal relationships between 
two genes according to conditional probability functions. 
The stochastic nature makes them more accurate in 
modeling the dynamics and nonlinearity of gene regula- 
tion in large-scale systems. Bayesian Networks, however, 
usually do not include cycles and, thus, are difficult to 
deal with feedback motifs. Ordinary differential 
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equations (ODEs) models [3-5] overcome this problem 
by modeling GRNs as a set of differential equations. 
Some other models such as signed directed graphs, mul- 
tiple regression, state space model, etc., are addressed in 
the survey [6]. 

Whereas most of the existing work focuses on small- 
sized GRNs, limited attention has been given to interac- 
tions among large scale genes. Conventional approaches 
are usually designed for the network with connectivity 
less than a small fixed number [7]. Computational com- 
plexity is a major obstacle in reconstruction of large 
scale GRNs as determining the parameters in such a 
network is time-consuming. Sparsity is a common 
assumption used in modeling GRNs to reduce the com- 
putational complexity. Typically, in a sparse network, 
one gene interacts with only a couple of genes [7]. 

Recently, Yuan et al. [8] proposed a directed partial 
correlation (DPC) method for regulatory network infer- 
ence on large-scale gene data. The DPC method com- 
bines the directed network inference approach and 
Granger causality concept for causal inference on time 
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series data to reconstruct large-scale GRNs. Although 
modular discovery was provided by biclustering in gene 
expression data, the DPC method cannot present multi- 
ple sub-networks simultaneously. 

We propose the NCI method for subnetwork identifi- 
cation by detecting community structures from large- 
scale gene expression data. Usually, GRNs have commu- 
nity structures: genes in the same groups are found with 
high density of "within-group" interactions and genes in 
different groups with low density of "between-group" 
interactions [9]. Many algorithms have been proposed to 
detect community structures by clustering [9-15]. To 
accomodate the large-scale GRN inference, we particu- 
larly propose a block principal component analysis 
(Block PCA) method, which explores community struc- 
ture information for the NCI method. 

The NCI method repeats two steps: (1) N-step: iden- 
tify possible gene regulatory networks; (2) C-step: esti- 
mate community structure. At the N-step, a convex 
quadratic programming, formulated for the community 
structure, is solved to infer possible GRNs. This quadra- 
tic programming can be identically divided into n (the 
total number of genes) sub-problems, each of which has 
a much smaller dimension, and, thus, adapt to large- 
scale networks. At the C-step, the NCI method esti- 
mates community structure from the GRNs identified at 
the first step. When the algorithm terminates, a network 
with community structures is obtained. 

Methods 

An ODE model for GRNs 

The processes of transcription and translation in a GRN 
consisting of n genes can be modeled as the following 
dynamic system: 



x = Cx + Sr 
r=f{x), 



(1) 



where vector x the concen- 

tration of mRNAs of n genes, C = diag [-c ir c 2 , -, - c„] 
e R n * " c ; represents the degradation rate of gene i, r 
= [fi, r 2 , r n ] T e R n is the reaction rates which is a 
function of concentrations of some mRNAs, and matrix 
Se R" x * represents the stoichiometric matrix of the 
biological network. For simplicity, one can assume the 
reaction rate r is a linear combination of mRNAs con- 
centrations. Let Fe R" * " be the coefficient matrix. 
Then, 



r = Fx. 

By substituting (2) into (1), we have 
x — Cx + SFx. 



(2) 



(3) 



A standard discretization of system (3) by using the 
zero-order hold method on m observation points for a 
given sampling time At is 



x(k+ 1) =Ax(k), 



(4) 



where A = e CAt + (e CAt - T)C l SF. 

Let X e R" * m be a matrix of gene expression data, 
with the columns being the measured gene expression 
levels at m time points, and n being the number of 
genes. Let X 1 and X 2 be the sub-matrices of X made up 
by the first m - 1 columns and last m - 1 columns of X, 
respectively. According to [16], the gene regulatory net- 
work can be inferred by solving the following optimiza- 
tion problem: 



min A || AX! -X 2 \\l 
s.t. A is stable, 



(5) 



where ||-||2 is the Euclidean norm. Stability is usually 
used as a criterion to determine the qualification of the 
inferred GRN. For discrete models, A = {a tl ) n x m is 
stable if 



| fly | < 1, for all i = \, ... ,n. 



(6) 



Moreover, since the network is commonly recognized 
as sparse, / x regularization is added to Eq. (5) to obtain 
a sparse matrix A. Hence, with the sparsity and stability 
conditions, (5) becomes 



min^ 



(7) 



IIAXi -X 2 ||l + kIIAIU, 

En . . 
1 < 1, for all i = \, ... ,n 

i = Zi, j \aij\ is the l x 



where y is a positive scalar, 
norm of matrix A. 



The NCI method 

Since rows of A are independent in the objective func- 
tion and constraints, problem (7) can be divided into n 
sub-problems and solved individually [16]. However, 
such a solution does not consider the information of 
community structure which implies multiple subnet- 
works. In this section, we propose the NCI method to 
overcome this problem. An observation is that interac- 
tions between genes in a community occur more fre- 
quently than those between different communities. We 
introduce a weighted matrix W = (w,y)„ x m to distin- 
guish genes in the communities with those outside. Wy 
is assigned a small positive value or zero if gene i and /' 
located in the same community; a relatively large value, 
otherwise. 
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By adding term (W, |A|) to (7), we have 

min ||AX 1 -X 2 ||f + y||A|| 1 +/i(W,|A|) 
s.t. y i |a,j-| < 1, for all i = 1, . . . , n, 



(8) 



Where ^ > 0 is a penalty parameter, |A| = (|«y|)„ x m , 
<W, |A|) = trace (W T \A\) = H u j w^a^. All elements of 
matrix W are nonnegative. 

We propose a clustering method, named Block PCA, 
to update weight matrix W. With Block PCA, we can 
obtain matrix L*, reflecting the community structure of 
its corresponding network. Then, weight matrix W can 
be updated by 



W = l n , n - L* 



(9) 



Where 1„,„ e R" xn is the matrix with all l's. 

For example, consider a network with five nodes and 

^1 1 1 0 (T 
1110 0 
1110 0 
0 0 0 1 1 
0 0 0 1 1 

Node 1, 2, and 3 form a community, and node 4 and 
5 form another community. Particularly, we apply sparse 
singular value decomposition (SSVD) [17] on a general 
L* to identify the communities in GRNs. The NCI 
method is summarized in Algorithm 1. 

Some additional details about Algorithm 1: 

1. Stop criteria. The NCI algorithm stops when either 
of the following two criteria meet. (1) Weighted matrix 
W converges, that is, 1 1 W^ k) - W<* +1) || < tol for a pre- 
defined constant tol > 0, where W**' denotes W at itera- 
tion k; (2) The number of iteration reaches the 
threshold. 

2. The efficiency of the algorithm mainly depends on 
the estimation of the community structure of the under- 
lying GRN. Since matrix A in (8) provides a base for 
estimation of community structure L* computed by (12), 
a poor estimation of A may result in an inaccurate W. 
Hence, instead of using only one estimation of A, we 



average out the errors by calculating a series of estima- 
tions with different arguments y in (8) and combining 
them together. More specifically, we choose a set of y 1( 

y q and compute the corresponding solutions A 1 , 
A q . Then, A in Step 1 of Algorithm 1 is set as 



a 



with p ■ 



argmax{|A"||u = 1, . . . ,cj},Vi,j. 



After the iteration terminates, model (8) is solved 
again to compute the matrix A with y = y„ where y r , is a 
parameter in Algorithm 1. 

3. The complexity of the subproblem is our primary 
concern about the design of the NCI algorithm. Since 
the subproblems may be called iteratively in Algorithm 
1, the complexity of the NCI algorithm is determined by 
those subproblems. Both sub-problems (8) and the 
Block PCA model are convex, and can be efficiently 
solved by CVX [18] and and the proposed splitting algo- 
rithm, respectively. As aforementioned, model (8) is 
dividable: rows of A in the objective function and the 
constraints are independent. Hence, it is equivalent to n 
sub-problems: 



min a . eR n 
s.t. 



\x\ai 



- X24 1 
1 < 1 



+ Y(i^/yivi+ 1, kl) 



(10) 



for i = 1, n, where X 2 i is the i-th row of the matrix 
X 2 , t = [1, 1] T e R n ,wJ,aJ is the i-th row of W 
and A, respectively. Sub-problem (10) can be trans- 
formed into a standard (convex) quadratic program- 
ming, and solved by software packages such as Mosek 
or CVX [18]. 

The Block PCA model 

The Block PCA model is motivated by Robust PCA 
model [19] 



mm L , E 
s.t. 



\\L\U + k\\E\U 
D = L + E, 



(11) 



which is the nuclear norm of the matrix L, and 

D is a given matrix. 



Algorithm 1 

Algorithm 1: The NCI algorithm 
Input: X; 

Output: matrix A and communities of the GRN; 

Step 0. (Initiation) W : = 0 e R" x ". Select y, > 0, y t >0, X,e (0, X 2 ). 

Step 1. (N-step: identify possible Networks) Solve (8) to find an approximate matrix A. 

Step 2. (C-step: estimate Community structure) Calculate weighted matrix 1/1/1 by Eq. (13), then solve the proposed block PCA model (12) to 
calculate L*. 

Step 3. (Update weight matrix) Update W by Eq. (9). If stop criteria are not satisfied, go to Step 1. 
Step 4. (GRN identification) Identify the communities of the computed GRN by SSVD. 
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The block PCA aims to seek a block submatrix in D 
by solving optimization problem 



min L , B IILIU + MWl, |I|)+A. 2 ||£||i 
s.t. D = L + E 



(12) 



where Wl is a weight matrix with all elements 
nonnegative. 

In Block PCA, D e R"' " is set to be matrix 
l n ,„ e R nx " with all l's, X 2 is constantly set as 1/01, 
and lie (0, X 2 ). For a network with n nodes, we define 
weight matrix Wl = (wl,y) H x „, where 



wly = iPij/Po) 2 , 



(13) 



is the length of the shortest path between the node 
i and /, p 0 > 0 is a parameter less than the diameter of 
the network. 

As in Robust PCA (11), the nuclear norm || • ||. 
usually induces a low rank matrix and the l x norm 1 1 • 1 1 
i induces a sparse matrix [19,20]. The constraint D = L 
+ E enforces to split matrix D into a low rank matrix L 
and a sparse matrix E. Different with Robust PCA, the 
Block PCA adds an extra term X l {Wl, \L\) = A x £wl„ r | 
Lij\ to (11). The nonnegative weight matrix Wl stands 
for the prior knowledge about low rank matrix L. 

Splitting algorithm for solving Block PCA 

Block PCA model (12) can be transformed to a linear 
semidefinite programming (SDP) 



mm 

W V W 2 ,L,E 



S.t. 



-[trace(Wi) + trace(W 2 )] +A.i(Wl,|L|) 



U 2 <1„,„, \E\) 

L 

W 2 

L + E = D. 



Wi 



(14) 



> 0; 



However, this transformation increases the size of the 
variable matrix from n x n to 2« x In. Existing SDP solvers 
such as CVX [18] can not solve large-scale SDP problems. 
Instead, we solve Block PCA problem (12) by extending 
the splitting method [21] for optimization problem 



Em 

Em 
^ AiXi = b. 



(15) 



Where 6 t : R"' R are closed convex functions, 
Ai e R ix "<, bcR 1 . 
Note that Block PCA (12) can be recast as 

min \\L\\, + X 1 (W1, |U|> + X 2 \\E\U 

L,E,U 



S.t. 



D 




I 


+ 


E 




0 


0 




L 




0 




u 



(16) 



By letting ^(-): = ||-||., 0 2 (-): = *i<Wl, \-\), f? 3 (-): 



A 2 ||-||i. and b ■■ 



,Ail ■■ 



,A 2 U 



and 



A^E 



Block PCA (12) can be treated as a gener- 



alized case of (15) with matrix variables L, E, U and lin- 
ear operators A\, A 2 , A3. 

Under the framework of [21], we next present an 
implementable splitting algorithm for the Block PCA 
model (12). 

Define operator 



Ss[t] 



t — s, if t > e, 
t + s, ift < —s, 
0, otherwise, 



(17) 



(e I and e > 0. It can be extended to an arbitrary 
matrix X e R"' " by applying element-wise operation, 
denoted by S E [X] . 

Consider the sigular value decompostion (SVD) of the 
matrix X 



X= WEV T , 



(18) 



where U and V are orthogonal matrices consisting of 
singular vectors, and 2 is the diagnal matrix made up of 
the singular values. For each x >0, the soft-thresholding 
operator V x is defined as [22] 



T> X (X) = U5 r (E)V T . 

More generally, for a matrix W < 
ments nonnegative, we define 

S W [X] = (%), Xy = S Wij [xij]. 



(19) 

t"' " with all ele- 
(20) 



Particularly, if W is the matrix with all elements 1, 1 1 
X\\ w degenerates to ||X||i, and 5 E w[X] degenerates to 

S e [X]. 
Let > 0, fi > 2, A k : 



, where A*, A k 2 e R n,n . 



Then, for the calculated (L k , E k , if, A k ), the steps for 
each iterative (L k+1 , E k+1 , U k+1 ,A k+1 ) for solving (12) are 
as follows. 

Step 1. Solve L k+1 by the following problem. 



min L I |L| U - (a* + A\, l) + 1 1 \L + E k - D\ \ 



+ j\\L-U k \\ 2 



(21) 



By Theorem 2.1 in [22], the unique solution of (21) is 
L k+1 =V T [Y], (22) 
where r = ^, Y = \\D - E k + U k ] + ± [a? + A*] . 
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Step 2. Update the Lagrangian multiplier 







fe+- 




At 2 


- Aj 


1 




k+- 




A 2 2 


- A* 



rfe+l 



= A j — p(L k+1 + E k — D), 



(23) 



fe+- 



A, 



= A k - /S(L fe+1 - l/ fe ). 
Step 3. Solve £* +1 by the following two problem. 
1 



k+- 



min{||l% lWl+ <A 2 ^, U) + ^\\U-U k \\ }, 



min{||E||i - N — , M 

£ A 2 



a! +2 ,£> + ^ 



By the property of the operator S T [Y] shown in [23], 
U fe+1 =S rXlWl [U], (24) 



1 

where , - , , fe+ T 

£ fe+1 =S a [E], 

1 

where , - , . 

a - ~ £ + £ M i • 



(25) 



Step 4. Update the Lagrange multiplier A k+1 by L k+1 , 
E k+1 . 



and 



AL fe || 2 + ||A£ fe || 2 + ||AL/ fe || 2 < e 2 , 



(28) 



L* A£* 



for tolerance S\ >0, £2 >0, where AL 
E k+1 - E k , Alf = lf +1 - U k . 

The splitting algorithm for solving Block PCA model 
is summarized in Algorithm 2. 

In Algorithm 2, arguments j3 and ji are currently set 
constant. Adaptive settings of these arguments may 
speed up the convergence. The discussion of this issue 
in a simple case can be referred to [24]. 

Results and discussion 

We examine the NCI method based on two synthetic gene 
regulatory networks with different sizes. The GRN in first 
test is a small-sized network consisting of 14 genes and 27 
interactions. There exist two communities in this GRN. In 
the second test, the network consists of 50 genes and 100 
interactions and the data come from the Artificial Gene 
Network database [25]. Since the gene network is syn- 
thetic, the corresponding matrix A in (5) is known before- 
hand. We solve the GRN by the NCI method and 
compare it with A to evaluate the performance of the algo- 
rithm. Moreover, we examine the performance of the pro- 
posed splitting algorithm in the third test. 

The metric used in the performance examination was 
introduced in [16]. It compares the signs of the esti- 
mated matrix A e with A. The accuracy is defined as 



accuracy = (r n + r 2 2 + r 33 ) /n , 



(29) 



fe+i 



A * — p(L k+1 + E k+1 — D), 



, fe+i 



A* - p(L k+1 - U k+1 ). 



The algorithm can be terminated when 



D - L k - E k \\ 2 + \\L k - U k \\ 2 



IIDII 



< £1 



where r llt r 2 2, and r 33 are the number of correctly 

(26) identified positives, zeros and negatives, representing 
promotions, repressions, and no interaction, respectively. 

The algorithm runs on a computer with Pentium (R) 
dual-core CPU E5200 2.50GHz, and RAM 2.0GB. The 
parameters of the algorithm are chosen as follows. In 

(27) Test 1, 7 in problem (8) is chosen from {0.05, 0.02, 
0.008} to find possible GRNs and y r = 0.02. In Test 2, 7 
is chosen from {0.02, 0.005, 0.001} and y r = 0.005. In the 



Algorithm 2 



Algorithm 2: Splitting algorithm 



Input: 1/1/1. 






Output: 


low rank matrix L. 




Step 0. 


Initiation. Set k = 0, p > 0, fj > 2, D = !„,„. Set £, >0, s 2 >0, X 2 = 


1/y/n, A, e (0, \ 2 ), L° = D, £° = 0, if = D. 


Step 1 


Solve by Eq. (22). 

Update the Lagrangian multiplier ^k+J by Eq. (23). 




Step 2. 




Step 3. 


Solve U k+ \ £* +1 via Eq. (24) and (25). 




Step 4. 


Update the Lagrange multiplier A** 1 via Eq. (26). 




Step 5. 


Terminate if the stop criteria (27) and (28) are satisfied; otherwise, k: 


= k + 1, goto Step 1. 
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(A) Noise free 



(B) Noise case 
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(D) Noise case 



30 



NCI 

NCI-single 




O 

O*' O g 

q^j o ***** 
* 


o 

°** 
O* * cF* 
* * ( 



30 



10 20 30 0 10 20 

experiments experiments 

Figure 2 Accuracy of NCI with the 30 runs. The accuracies of the NCI method and SGN method are shown in (A), (B) with 30 random 
experiments. (C), (D) shows the efficiency of searching multiple possible GRNs at N-step of NCI method. "NCI" indicates normal NCI method 
searching multiple possible GRNs, "SGN" denotes the sparse gene regulatory network method [16], and "NCI-single" indicates the NCI method 
searching a single GRN at N-step. In (A) and (C), the expression levels are accurate, while in (B) and (D) 10% elements of the expression levels 
are incorporated with Gaussian noise with zero mean and unit variance. 
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first two tests, n is chosen as 10/ for problem (8), k\ as 
0.2X 2 in the Block PCA model, and p 0 as jd for Eq. (9), 
where d is the diameter of the corresponding network. 
The algorithm terminates in 3 iterations. 

Test 1. A small gene network with 14 genes 

Figure 1 shows the network and its two communities. 
The diameter of this gene network is 6. We choose dif- 
ferent initial gene expression levels randomly for 30 
times. The corresponding 30 accuracy rates of the calcu- 
lated GRN are shown in Figure 2(A). "NCI" and "SGN" 
denote the NCI algorithm and sparse gene regulatory 
network method [16], respectively. Compared with the 
SGN algorithm, the NCI significantly improved predic- 
tion accuracy. In the noise case, 10% elements of the 
gene expression matrix X are incorporated with Gauss 
noise with zero mean and unit variance. The accuracy 
rates of two methods are shown in Figure 2(B). In both 
of the noise-free (Figure 2(A)) and noise cases (Figure 2 
(B)), the NCI method has much better performance in 
most of the 30 runs. In the noise-free case, the NCI 
algorithm increases the average accuracy from 78.8% to 
83.5%. In the noise case, the NCI algorithm increases 
the average accuracy from 87.3% to 88.9%. 

To show the effectiveness of the NCI method at N- 
step in searching multiple possible GRNs, we compare 
the accuracy rates with the results of one iteration at N- 
step (y = Yz a t N-step). As shown in Figure 2(C), the 
average accuracy is improved from 78.5% to 83.5% in 
the 30 runs. In noise case (Figure 2(D)), the average 
accuracy is improved from 87.6% to 88.9%. Thus, a 
number of iterations at N-step is necessary for finding 
accurate GRNs with the NCI algorithm. 



Table 1 Characteristics of webl 00-023 network 



number of vertices 


50 


number of arcs 


100 


density 


0.04 


in-degree center 


Node 1 


diameter 


10 


out-degree center 


Node 1 


characteristic path length 


13.0612 


closeness center 


Node 14 


average clustering coefficient 


0.0747 


betweeness center 


Node 1 



These statistics come from [27]. 



Test 2. A gene network with 50 genes 

The network in the second test consists of 50 genes and 
100 interactions (See Figure 3(A)). Network statistics are 
listed in Table 1. The nodes in red in Figure 3(A)) form 
a unique community. The inferred network by NCI 
algorithm contains 41 genes and 87 interactions. As 
shown in Figure 3(B), the community identified by the 
NCI algorithm has a very large overlap with the true 
community. Among 34 genes in the true community, 23 
important ones (with large in-degree and out-degree) 
are successfully identified. 

Test 3. The performance of the Block PCA and splitting 
algorithm 

The following experiments are specially designed to test 
the efficiency of the Block PCA method and the perfor- 
mance of the splitting algorithm as well. We randomly 
generate three clusters with 30 points (See Figure 4(A)). 
Three clusters calculated by K-means are shown in Fig- 
ure 4(B). Based on the distances between these points, 
matrix Wl is calculated by Eq. (13) with p 0 = 0.684. 
The results of the splitting algorithm are shown in Fig- 
ure 4(C). The corresponding three clusters calculated by 
SSVD are displayed with different colors in Figure 4(D). 
Among 30 data points, two points (the point 25 and 30) 



(B) 




(A) 




Figure 3 Inference of the webl 00-023 gene regulatory network. The webl 00-023 gene network consists of 50 genes and 100 interactions 
which is shown in (A), where "— >" indicates promotion interaction, while *-| " indicates repression. The 34 nodes in red in (A) form a unique 
community. With the 34 genes in the true community, the community identified by the NCI algorithm has an overlap of 23 important genes 
(with large in-degree and out-degree). The overlap is indicated by nodes in red in (B). 
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Figure 4 Clusters produced by Block PCA and K-means In (A), 
30 points are randomly generated on a plane. Three clusters 
calculated by K-means are shown in (B). The low rank matrix 
calculated by the splitting algorithm is depicted in (C). The 
corresponding three clusters calculated by SSVD are displayed with 
different colors depicted in (D). Among 30 data points, two points 
(the point 25 and 30) are outliers, not included in any cluster. The 
clustering result of the remaining 28 points is identical with that of 
K-means. 



are outliers, not included in any cluster. The clustering 
result of the remaining 28 points is identical with that 
of K-means. 

To verify the effectiveness of the argument A 1( we 
choose different values and the calculated low rank 
matrices L are shown in Figure 5. It is shown that the 
number of nonzero elements of L (white pixels of the 
images) decreases, as Ai increases. The numbers of non- 
zero elements of L are 804, 485, 284 and 100, with Ai = 
0.2A 2 , 0.4A 2 , 0.6A 2 , and 0.7A 2 in Figure 5(A), (B), (C) and 
5(D), respectively. 

We compare the performance of the splitting algo- 
rithm with CVX and SDPNAL [26] by which the Block 
PCA model is solved via the SDP formulation (14). The 
results are listed in Table 2, where "Points30" indicates 
calculating on the data of 30 points on a plane, "funVal" 
indicates the calculated objective function value for the 
Block PCA model, "split", "cvx" and "sdpnal" indicate 
splitting method, CVX and SDPNAL, respectively. It is 
shown in Table 2 that splitting algorithm outperforms 
others in all the tests. 

Conclusion 

We have developed the NCI method for gene regulatory 
network reconstruction from gene expression data. 
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Figure 5 Low rank matrices of the Block PCA model with various A,, The figure depicts the low rank matrices L of the Block PCA model 
with different values of X v In (A), (B), (C) and (D), the value of A., of the Block PCA model is chosen as 0.2/l 2 , 0.4A 2 , 0.6A 2 , and 0.7A 2 , respectively. 
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Table 2 Results of splitting algorithm, CVX and SDPNAL for solving the Block PCA model 







Points30 


Points50 


Points100 


Points200 


Points500 


funVal 


split 


146.4654 


301.4004 


793.5010 


2.1104e3 


8.0980e3 




CVX 


146.4654 


- 


- 


- 


- 




sdpna 


149.9589 


307.0276 


794.4018 


2.1 192e3 


8.1217e3 


\\D-L- E\\ 


split 


3.0041 e-5 


1 .0667e-5 


7.5166e-5 


1 .4488e-4 


3.7525e-4 




CVX 


1 .2590e-6 


- 


- 


- 


- 




sdpnal 


4.2797e-5 


2.341 1e-4 


9.1722e-4 


0.0033 


0.0057 


n- 


split 


29.5328 


54.8451 


1 1 8.4007 


260.8353 


720.8204 




CVX 


29.5323 


- 


- 


- 


- 




sdpnal 


29.8736 


55.2079 


119.9717 


261.0082 


707.0530 


(W, \L\) 


split 


631.1912 


1.5986e3 


7.0579e3 


2.7437e4 


1 .8056e5 




CVX 


631.1811 












sdpnal 


700.7054 


1.7995e3 


7.0395e3 


2.8083e4 


1 .8329e5 


l|£|h 


split 


451.1092 


1.2638e3 


4.6336e3 


1 .7925e4 


1.1079e5 




CVX 


451.1147 












sdpna 


447.5222 


1.2408e3 


4.6324e3 


1 .7854e4 


1.1081e5 


elapsed time(s) 


split 


4.88 


19.66 


80.43 


687.65 


1 4342e4 




CVX 


6.74 












sdpna 


78.99 


265.48 


1.3231e3 


6.9678e3 


5.1222e4 



"Points30" indicates the data with 30 points on a plane, and similarly for "Points50", "Points500". "funVal" represents the calculated objective function value of 
the Block PCA model. "-" indicates out of memory. "2.1 1 04e3" indicates 2.1 1 04 x 1 0 3 . The implementation of SDPNAL is version 0 [28] and CVX is version 1 .21 
[29]. The SDPT3 engine is chosen for the CVX solver. The two arguments for the splitting algorithm are set as follows: fj is constantly set 5 for all tests, B is set 
20/n for Points30, 200/n for Points50, PointslOO, Points200, and 500/n for Points500, where n is the number of points. 



Based on the convex programming technology, the NCI 
method has shown the capability to identify multiple 
subnetworks within a large-scale gene regulatory net- 
work. The NCI method includes two main steps. At the 
first step, the algorithm infers a gene regulatory net- 
work. At the second step, the algorithm estimates 
potential community structures. These two steps repeat 
until the algorithm terminates. Furthermore, we have 
proposed an efficient Block PCA method for exploring 
communities within a GRN and the splitting algorithm 
for the Block PCA model. Numerical experiments have 
validated the effectiveness of the NCI method in identi- 
fying GRNs and inferring the communities. 

Abbreviations 

GRN: gene regulatory network; NCI: network and community identification; 
Block PCA: block principal component analysis. 
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